Finding Notes in Music

What happens when we apply unsuperised learning to musical recordings?

Can we find specific notes?

Data Sources

MusicNet

id composer composition movement ensemble source transcriber catalog_name seconds
0 1727 Schubert Piano Quintet in A major 2. Andante Piano Quintet European Archive http://tirolmusic.blogspot.com/ OP114 447
1 1728 Schubert Piano Quintet in A major 3. Scherzo: Presto Piano Quintet European Archive http://tirolmusic.blogspot.com/ OP114 251
2 1729 Schubert Piano Quintet in A major 4. Andantino - Allegretto Piano Quintet European Archive http://tirolmusic.blogspot.com/ OP114 444
3 1730 Schubert Piano Quintet in A major 5. Allegro giusto Piano Quintet European Archive http://tirolmusic.blogspot.com/ OP114 368
4 1733 Schubert Piano Sonata in A major 2. Andantino Solo Piano Museopen Segundo G. Yogore D959 546
array(['Piano Quintet', 'Solo Piano', 'Piano Trio', 'Viola Quintet',
       'String Quartet', 'Clarinet Quintet',
       'Pairs Clarinet-Horn-Bassoon', 'Wind Quintet', 'Accompanied Cello',
       'Accompanied Clarinet', 'Wind and Strings Octet', 'String Sextet',
       'Piano Quartet', 'Horn Piano Trio', 'Solo Violin', 'Solo Flute',
       'Solo Cello', 'Violin and Harpsichord',
       'Clarinet-Cello-Piano Trio', 'Accompanied Violin', 'Wind Octet'],
      dtype=object)
ensemble
Solo Cello      12
Solo Flute       3
Solo Piano     156
Solo Violin      9
Name: id, dtype: int64

URMP

University of Rochester Multi-Modal Performance Dataset

http://www2.ece.rochester.edu/projects/air/projects/URMP.html

file_path
instrument
Bassoon 3
Cello 11
Clarinet 10
Double Bass 3
Flute 18
Horn 5
Oboe 6
Saxophone 11
Trombone 8
Trumpet 22
Tuba 5
Viola 13
Violin 34

Combination

URMP (13 instruments) + MusicNet (4 instuments) - overlap (3 instruments) = unique (14 instruments)

14 instruments * 3 recordings/instrument = 42 recordings

Initial Experimentation (MusicNet only)

Initial process:

  1. Take 100 1-sec samples from 3 recordings of each instrument
  2. Calculate the F.T. of each 1-sec sample
  3. Dimensionality reduction

Recordings:

id composer composition movement instrument source transcriber catalog_name seconds file_path
161 2296 Bach Cello Suite 4 4. Sarabande Cello European Archive David J. Grossman BWV1010 291 musicnet/2296.wav
160 2295 Bach Cello Suite 4 3. Courante Cello European Archive David J. Grossman BWV1010 259 musicnet/2295.wav
127 2218 Bach Cello Suite 3 2. Allemande Cello European Archive David J. Grossman BWV1009 199 musicnet/2218.wav
114 2202 Bach Partita in A minor 1. Allemande Flute Scott Goff David J. Grossman BWV1013 181 musicnet/2202.wav
115 2203 Bach Partita in A minor 2. Corrente Flute Scott Goff David J. Grossman BWV1013 156 musicnet/2203.wav

Fourier Transform (F. T.) illustration

Realistic F.T.

PCA by Instrument

t-SNE by Instrument

UMAP by Instrument

Single Recording Example

  1. Subsample, Fourier Transform
  2. Dimensionality Reduction
  3. Cluster
  4. Identify Clusters
  5. Identify Intervals

1. Get the Fourier transforms of subsamples of an audio file.

2. Dimensionality Reduction

PCA

t-SNE

UMAP

_ = plt.figure(figsize=(12,8))
umap_df = reduce_dimension(ft, UMAP, {'n_neighbors':5, 'min_dist':0.005, 'metric':'correlation'})

3. Cluster

_ = plt.figure(figsize=(10,7))
umap_dbscan_df = cluster_2D(umap_df, DBSCAN, {'eps':0.6, 'min_samples':7}, show_plot=True)

4. Identify Clusters

0
1
11
15

Each cluster is assigned the frequency of the longest interval(s) in the cluster.

Frequency fitting:

  1. Find the most prominent peaks (of the F.T.).
  2. Find the tallest peak of those.
  3. Check for peaks within 4% of 1/2 and 1/3 of tallest peak frequency.
  4. If a match is found, that is the fundamental, otherwise, the tallest peak is.
  5. Check for peaks within 4% of the 2nd through 15th harmonics.
  6. If there are any matches, return the fundamental frequency.
cluster_fundamental_freqs
{1: 294.43359375,
 2: 351.5625,
 3: 442.55514705882354,
 4: 591.796875,
 5: 660.64453125,
 6: 700.78125,
 7: 796.875,
 9: 556.640625,
 10: 890.625,
 11: 938.8786764705882,
 12: 523.4375,
 15: 388.7867647058824,
 17: 494.31818181818187}

5. Identify Intervals

What if we find the frequency of each interval (instead of each cluster)?

All intervals are assigned their cluster frequency.

Intervals are fit individually. Failed fits default to cluster frequency.

Evaluation

start freq duration
0 4.429 294.110 0.644
1 5.091 352.629 0.540
2 5.654 442.804 0.615
3 6.298 594.881 0.290
4 6.594 660.201 0.116

A "ground truth" notes table.

Accuracy: 12 / 16 = 0.75

Accuracy: 16 / 18 = 0.89 ?

Accuracy: 16 / 23 = 0.70 ?

Collecting Notes by Instrument

Conclusion

Results

  • Solo audio sample $\rightarrow$ notes played in sample
  • 623 notes found in 42 recordings: ~15 notes/recording
  • Note-finding accuracy: 0.75

Model Strengths

  1. Identifies most notes accurately
  2. Robust to small pitch variations
  3. Excellent frequency resolution

Model Weaknesses

  1. Misses short, infrequent notes
  2. Cannot reliably distinguish octaves and half-steps
  3. Violin-biased

Potential Uses

  1. Evaluating musical performance
  2. Transcribing music
  3. Identifying instruments by sound

Next Steps

  1. Clustering in higher-dimensional subspaces
  2. Flexible-length intervals
  3. Full set of notes for each instrument
  4. Closer look at other instrument results

Backup

_ = plt.figure(figsize=(10,7))
umap_agglomerative_df = cluster_2D(umap_df, AgglomerativeClustering, {'n_clusters':19}, show_plot=True)
_ = plt.figure(figsize=(10,7))
umap_spectral_df = cluster_2D(umap_df, SpectralClustering, {'n_clusters':20}, show_plot=True)
_ = plt.figure(figsize=(10,7))
umap_spectral_df = cluster_2D(umap_df, MeanShift, {'bandwidth':3}, show_plot=True)
_ = plt.figure(figsize=(12,7))
tsne = TSNE(n_components=2, perplexity=50, n_iter=500)
tsne_res = tsne.fit_transform(stfts_shaped)
tsne_res_df = pd.DataFrame(tsne_res, columns=['tsne1', 'tsne2'])
tsne_res_df['instrument'] = ins
splot = sns.scatterplot(data=tsne_res_df, x='tsne1', y='tsne2', hue='instrument', palette=sns.color_palette(palette, 14))
_ = splot.legend(bbox_to_anchor=(1.2, 1))
umap_results = UMAP(n_neighbors=7, min_dist=0.1, metric='correlation').fit_transform(stfts_shaped)
umap_df = pd.DataFrame(umap_results, columns=['umap1', 'umap2'])
umap_df['instrument'] = ins
_ = plt.figure(figsize=(12,7))
splot = sns.scatterplot(data=umap_df, x='umap1', y='umap2', hue='instrument', palette=sns.color_palette(palette, 14))
_ = splot.legend(bbox_to_anchor=(1.2, 1))